Posters
Poster Categories
Poster Schedule
Preparing your Poster - Information and Poster Size
How to mount your poster
Print your poster in Basel
View Posters By Category
Session A: (July 22 and July 23) | Session B: (July 24 and July 25) |
---|---|
Presentation Schedule for July 22, 6:00 pm – 8:00 pm
Presentation Schedule for July 23, 6:00 pm – 8:00 pm |
Presentation Schedule for July 24, 6:00 pm – 8:00 pm |
Session A Poster Set-up and Dismantle |
Session B Poster Set-up and Dismantle |
Short Abstract: The genomes of cancer cells are constantly reshaped during pathogenesis. This evolutionary process leads to the emergence of subclonal populations, which can limit therapeutic interventions by the emergence of drug-resistance mutations. Data derived from massively parallel sequencing can be used to infer these subclonal populations from tumor-specific point mutations. The accurate determination of copy number changes and tumor impurity is an indispensable requirement to reliably infer these subclonal populations by mutational clustering. This protocol describes a copy number analysis method together with a novel mutational clustering approach. The method is called Sclust. In a series of simulations and comparisons with alternative methods, we showed that Sclust accurately determines copy number states and subclonal populations. Performance tests showed that the entire method is computationally extremely efficient. In particular, copy number analysis and mutational clustering takes less than 10 minutes. Sclust is designed that even non-experts in computational biology or bioinformatics with basic knowledge of the Linux/Unix command line syntax should be able to carry out analyses with Sclust.
Short Abstract: Translational research of many disease areas requires a longitudinal understanding of disease development and progression across all biologically relevant scales. Several corresponding studies are now available. However, to compile a comprehensive picture of a specific disease, multiple studies need to be analyzed and compared. A large number of clinical studies is nowadays conducted in the context of drug development in pharmaceutical research. However, legal and ethical constraints typically do not allow for sharing sensitive patient data. In consequence there exist data ``silos'', which slow down the overall scientific progress in translational research. We suggest the idea of a Virtual Cohort (VC) to address this limitation. Our key idea is to describe a longitudinal patient cohort via a generative statistical model, namely a Bayesian Network, in conjunction with deep learning methods. We show that with the help of such a model we can simulate subjects that are largely indistinguishable to real ones. Our approach allows for incorporating arbitrary multi-scale, multi-modal data without making specific distribution assumptions. Moreover, we demonstrate the possibility to simulate counterfactual interventions (e.g. via a treatment) in the VC. Overall, our proposed approach opens the possibility to build sufficiently realistic VCs for multiple disease areas in the future.
Short Abstract: The majority of clinical trial failures are caused by low efficacy of investigated drugs, often due to a poor choice of target protein. Computational prioritization approaches aim to support target selection by ranking candidate targets in the context of a given disease. We propose a novel target prioritization approach, GuiltyTargets, which relies on deep network representation learning of a genome-wide protein-protein interaction network annotated with disease-specific differential gene expression and uses positive-unlabeled machine learning for candidate ranking. We evaluated our approach on six diseases of different types (cancer, metabolic, neurodegenerative) within a 10 times repeated 5-fold stratified cross-validation and achieved AUROC values between 0.92 - 0.94, significantly outperforming a previous approach, which relies on manually engineered topological features. Moreover, we showed that GuiltyTargets allows for target repositioning across related disease areas. Application of GuiltyTargets to Alzheimer’s disease resulted into a number of highly ranked candidates that are currently discussed as targets in the literature. Interestingly, one (COMT) is also the target of an approved drug (Tolcapone) for Parkinson’s disease, highlighting the potential for target repositioning with our method.
Short Abstract: Pyrrole-Imidazole polyamides are DNA binders with strong affinity to the minor-groove and sequence specificity, but the typical design involves short motifs that often lead to non-unique genomic binding, consequently leading to off-target effects. In our quest to designing and optimizing these class of polyamides for precision medicine, we have recently developed a machine-learning based method to infer possible phenotypic changes and side effects. Expression changes from RNA microarrays were used to assess possible outward phenotypic changes, manifested as side effects, should the same PI polyamide candidate be administered clinically. We validated some of these effects with a series of animal experiments, and found agreeable corroboration in certain side effects in animal experiments.
Short Abstract: Chimeric Antigen Receptor (CAR) T-cell therapy has exhibited dramatic anti-tumor efficacy in clinical trials. In this study, we reported the transcriptome profiles of bone marrow cells in four B-cell acute lymphoblastic leukemia (B-ALL) patients before and after CD19-specific CAR-T therapy. CD19-CAR-T therapy remarkably reduced leukemia cells and three patients achieved bone marrow remission (MRD negative). The efficacy of CD19-CAR-T therapy on B-ALL was positively correlated with the abundance of CAR and immune cell sub-populations (e.g., CD8+ T cells and NK cells) in the bone marrow. Additionally, CD19-CAR-T therapy mainly influenced the expression of genes linked to cell cycle and immune response pathways, including “Natural killer cell mediated cytotoxicity” and “NOD-like receptor signaling pathway”. The regulatory network analyses revealed that miRNAs (e.g., miR-148a-3p/375) acting as oncogenes or tumor suppressors could regulate crosstalk transcription factors (TFs, e.g., JUN/FOS) and histones genes (e.g., HIST1H4A/HIST2H4A) involved in CD19-CAR-T therapy. Furthermore, many lncRNAs showed a high degree of co-expression with TFs/histones (e.g., FOS/HIST1H4B) and were associated with immune processes. These transcriptome analyses provided important clues for further understanding the gene expression and related mechanisms underlying the therapeutic efficacy of CAR-T.
Short Abstract: In this study, we focus on the relative association on categorical variables for the times of two censored events. In the time-series studies, the times to event data is commonly censored due to a lack of cost or duration or unknown reasons. This phenomenon causes that a part of data is missed where the last observed times are only given. Conventional survival analysis has been often applied to deal with the censored time data, but the multiple events problem is rarely considered. However, the recently developed software GAIT (Gene Expression Analysis for Interval Time) which has achieved a remarkable performance for discovering the genes highly associated with the interval time between two events by a statistical significance in simulation studies presents a solution with respect to the problem. With this progress, we developed the another software to cover categorical variables instead of continuous gene expression matrix. To support reliability on the implementation, we compared the performance of the software with some straightforward analyses by simulation studies. The results show that the software achieve higher performance that other analysis.
Short Abstract: Recent genome-editing technologies have increased the popularity of the rat as a genetic model of disease. RGD collects datasets for rat diseases, phenotypes, strains, pathways, drug/chemical-gene interactions and comparative data for human, mouse, and other organisms. There are currently 197,813 rat, 582,160 human and 199,278 mouse disease annotations. RGD has created Disease Portals and tools that allow researchers to mine, integrate, and analyze both public and the user's own experimental data within the context of existing RGD resources. Thus the dataset could be investigated for example with OLGA that assembles gene, QTL or strain data, with PhenoMiner that compares data across experiments or InterViewer that displays the interactions. RGD currently builds a multi-organism platform integrating genomic elements for rat, human, mouse, dog, chinchilla, bonobo, and 13-lined ground squirrel. The translational platform will help to establish a Precision Model portal for preclinical research. Animal models advance precision medicine by increasing knowledge on gene function, effects of variations in different genetic backgrounds and validation of therapies. Cataloging of gene products, mapping abnormalities observed in model organisms to human disease phenotypes will help to identify the potentially pathogenic gene variants and develop individual treatment strategies. That is a current direction RGD is targeting.
Short Abstract: Lung adenocarcinoma (LUAD) is the predominant histological subtype of lung cancer, which is the leading cause of cancer death. Nowadays, target therapies were still limited improvement of LUAD survival, suggesting that discovering unknown therapeutic targets are important. Non-coding RNAs (ncRNAs) are indispensable in cancer progression with the potential for serving as theranostic biomarkers. However, few investigations predicted the molecular mechanisms of prognostic ncRNAs in tumorigenic pathways in LUAD. NcRNA-based network prediction is necessary to understand unrevealed functions and mechanisms of ncRNAs. Here, we established an approach which conducted LUAD cohorts to uncover prognostic ncRNAs and explore pathological mechanisms. After in silico and experimental validations, we prioritized PTTG3P from other prognostic ncRNAs for mechanistic studies. Up-regulation of PTTG3P increased cell proliferation and sustained tumor growth leading to poor survival in lung orthotopic models. According to the core module from systematic pipeline, PTTG3P was demonstrated that collaborate with transcription factor FOXM1 to regulate the transcriptional activity of mitotic kinase BUB1B to facilitate tumor growth leading to poor outcomes of LUAD patients. Together, we established a systematic strategy to uncover driving prognostic ncRNAs and confirmed that up-regulated PTTG3P/FOXM1/BUB1B axis could be the theranostic target for LUAD, which could be applied to pan-cancer studies.
Short Abstract: Introduction: This study is aimed to establish the integrated network of DNA methylation and RNA expression in OVA-induced asthma model and to investigate epigenetically regulated genes related with the development of asthma. Methods: We performed whole-genome DNA methylation microarray and RNA-sequencing on 3 lung samples in OVA-induced asthma mice. Results: A total of 35,401 DMRs were identified between OVA-induced asthma and control. Of these, 1010 sites were present in the promoter regions, of which 370 genes showed an inverse correlation between methylation and gene expression. In KEGG pathway, 368 genes was up or down regulated in OVA-induced asthma sample, which were included in chemokine signaling, leukocyte transendothelial migration and vascular smooth muscle contraction signaling pathway. Conducted integrated network analysis, 4 hub genes consist of 3 upregulated genes (FOXO1, SP1, APP) and a downregulated gene (RUNX1). Four hub genes showed a correlation between DNA methylation and gene expression and were highly interconnected nodes in IPA module, which were functionally significant. Conclusion: We identified connected hub gene including FOXO1, RUNX1, SP1 and APP in integrated networks of DNA methylation and gene expression in the development of asthma. These results indicated that the modulation of four gene is effective to control the asthma.
Short Abstract: Rheumatoid arthritis (RA) is a complex disease with fluctuating course of progression. Although treatments have improved in recent years, response is not guaranteed. The aim of this study was to identify heterogeneity in response to treatment in RA. Longitudinal patient data for 485 RA patients receiving anti-IL6 treatment were extracted from a biomarker sub-study of a phase III clinical trial. Latent class mixed-models were used to identify distinct trajectories of disease activity using DAS28 activity score after treatment initiation. Clinical measurements were then analysed to characterise patients by serological biomarkers and demographics. Three distinct trajectories of drug response were identified using latent class analysis. Class 1 showed the least reduction in DAS28, with high proportion of patients seeking escape therapy. Class 3 showed significantly higher rates of improvement in DAS28 achieving remission. Whilst baseline demographics were not predictive of class membership, tender joint count and erythrocyte sedimentation rate were significantly different at baseline (p<0.05). Analysis of longitudinal biomarkers identified significant differences in change in biomarker level from baseline of matrix metalloprotease 3 and CRP (p<0.05). Statistical learning allowed the identification of distinct treatment response trajectories. Identification of homogenous patient populations of response may allow for more targeted therapeutic treatment regimens.
Short Abstract: In the literature, the problem of clustering multivariate short time series is still largely unaddressed. However, multivariate short time series are common in clinical data, when multivariate patient measurements are taken over time. The clustering (stratification) of such clinical data is additionally complicated by the typically high degree of missingness. For this purpose, we developed variational deep embedding with recurrence (VaDER). VaDER extends variational deep embedding (VaDE), a clustering algorithm built on variational autoencoder principles. VaDER enables the analysis of multivariate short time series with many missing values, by (1) incorporating long short term memory networks (LSTMs) into VaDE's architecture, and (2) defining an architecture and loss function that directly deal with missing values by implicit imputation and loss re-weighting. We technically validated VaDER by accurately recovering clusters from noisy simulated data with known ground truth clustering. We then used VaDER to successfully stratify (1) Alzheimer's disease patients and (2) Parkinson's disease patients into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected significant underlying biological differences. We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate short time series clustering in general.
Short Abstract: The DisGeNET platform contains information about the genetic determinants of human diseases and traits. Initially developed as a Cytoscape plugin, it has evolved into different formats and applications, and it now undergoes its 6th release. The current version of DisGeNET contains more than 600,000 gene-disease associations, between 17,000 genes and 24,000 diseases and phenotypes. The platform also includes 210,000 variant-disease associations, between 117,000 variants and 10,000 diseases and traits. The data has been obtained by integrating information from a dozen repositories with associations extracted from Medline abstracts using state-of-the-art text mining technologies, with a tool developed for this purpose. DisGeNET offers a suite of bioinformatics tools to facilitate the exploration and analysis of data: a web interface, a Cytoscape App, a new API, an R package and Linked Data resources. The DisGeNET-RDF SPARQL endpoint is one of the Elixir Recommended Interoperability Resources. DisGeNET is an established resource, with approximately 20,000 users per year, used to address a variety of research questions in disease genomics, disease comorbidity, drug development and toxicity studies, and thus, it constitutes a powerful tool to facilitate translational research, and boost precision medicine
Short Abstract: The clinical efficacy of therapeutic monoclonal antibodies for breast and colorectal cancer has greatly contributed to the improvement of patients’ outcomes by individualizing their treatments. However, primary or acquired resistance to treatment reduce its efficacy. In this context, the identification of biomarkers predictive of drug response would support research and development of alternative treatments. Currently, several molecular biomarkers of treatment response for breast and colorectal cancer have been described. However, this information is scattered across several resources and not properly integrated hindering its potential use. Therefore, there is a need for resources that offer biomarker data in a harmonized manner to support the identification of actionable biomarkers of response to treatment in cancer. ResMarkerDB was developed as a comprehensive resource of biomarkers of drug response in colorectal and breast cancer. It integrates these data from existing repositories, and new data extracted and curated from the literature (referred as ResCur). The database contains more than 500 biomarker-drug-tumour associations. ResMarkerDB provides a web interface (http://www.resmarkerdb.org) to facilitate the exploration of the current knowledge of biomarkers of response in breast and colorectal cancer. It aims to enhance translational research efforts in identifying actionable biomarkers of drug response in cancer.
Short Abstract: The challenges in drug discovery, including high attrition rates in the late development stage, are well documented. This has led to an increased interest and needs for applying machine learning and artificial intelligence across the drug discovery pipeline from target identification to chemical lead selection and optimisation. It has also been demonstrated that drugs with human genetic validation data are more likely to succeed in the clinic. To address this, it is essential to unravel genetic networks to identify new or better targets for which the underlying mechanism is clear. Despite the significant advances in next-generation sequencing technologies and evolving databases of patient cohorts, the sheer complexity of these datasets makes their integration and interrogation a daunting task. Through the development and application of cutting-edge computational approaches, such as artificial intelligence, machine learning and mathematical modelling, to pharmacogenomics and drug discovery, we identify novel therapeutic targets, biomarkers and drug repositioning opportunities. We developed a computational platform that performs (1) systematic integration and harmonisation of biomedical big data (2) multi-omic disease association study and (3) network theory-based analysis of targetable pathway which has significant potential to provide unprecedented insights into vital biological processes and the control hubs that underpin disease.
Short Abstract: Insulin resistance (IR) and obesity differ among ethnic groups in Singapore, with the Malays more obese yet less IR than Asian-Indians. However, the molecular basis underlying these differences is not clear. We used an integrative-omics approach to investigate the molecular basis underlying the ethnic differences in IR, specifically investigating pre-diabetic subjects from the three major ethnicities in Singapore. We integrated skeletal muscle (SM) transcriptomic, genomic, and phenotypic data to identify molecular pathways in SM that are associated with ethnic differences in IR, obesity, and related traits. To further infer causality we integrated our findings with large GWAS datasets for enrichment of previously associated genetic loci. We identified a network of 46 genes that were specifically down-regulated in Malays, suggesting dysregulation of components of cellular respiration in SM of Malay individuals. We also identified 28 differentially expressed gene clusters, four of which were enriched for genes found in GWAS of metabolic traits and disease. This study identified extensive gene expression changes in SM between the three Singaporean ethnicities and highlights specific genes and molecular pathways that may help explain the differences in IR among these ethnic groups.
Short Abstract: Understanding the complex immune response to cancer is a prerequisite to develop new therapeutic strategies. The interplay between tumor cells and host immune system is reflected in the composition and heterogeneity of the tumor microenvironment (TME). Here, we present a novel approach to quantify spatial TME heterogeneity while facilitating intuitively accessible visualizations using whole-slide multiplex immunofluorescence images from 48 resected NSCLC samples. Using a grid-based approach, local microenvironment composition was captured as local co-expression vectors across tissue sections. Pooled vectors from all samples were clustered into representative patterns that describe local TME composition and were subsequently used to generate tumor maps. Pattern frequency-derived patient profiles were used to build a joint immune landscape and correlated against clinical covariates. We detected local patterns characterizing distinct immune states including non-immunogenic, macrophage- or proliferation-dominated and highly inflamed areas. The resulting immune landscape revealed patient subsets not visible using global marker expression. Mapping clinical parameters into this landscape unveiled an association between tumor stage and TME composition, suggesting a temporal order of selected immune states. Our method facilitates unsupervised identification of patient subsets using local immune patterns that would not be accessible using traditional microscopy approaches, thereby paving the way for immune status-driven stratification.
Short Abstract: Motivation: The collection of various ‘-omics’ data associated with clinical datasets opens up new perspectives in terms of personalized medicine. In order to predict patient survival, the Cox model allows to link overall survival to RNA levels of the patient tumor. Lasso, Elastic Net (EN) and Adaptive Elastic Net (AEN) are popular penalization methods to select prognostic biomarkers. We evaluate these methods in terms of stability, selection and prediction both for microRNA (p ~ n) and messenger RNA (p >> n) datasets for kidney renal clear-cell carcinoma (KIRC) from TCGA. Results: First, only 20% of common genes on average are selected in two independent datasets for EN when p ~ n. EN slightly outperforms Lasso and AEN both in terms of stability and selection but the true positive rate remains below 0.48 and the false discovery rate above 0.58 in simulations.. Despite these relatively poor selection performances, all three methods perform equally well in predicting patient survival probability, highly anti-correlated with stratified prognostic index (PI). Besides, individual patient survival is accurate for nearly 18% of the patients for whom PI is above 1, corresponding to patients with poor prognosis.
Short Abstract: Background Transplantation of lungs from cardiac death donors (DCD) in addition to donation after brain death (DBD) became routine worldwide to address the global organ shortage. This was possible with the development of ex vivo lung perfusion (EVLP) for assessment and repair. We hypothesize that there are differences between lungs from DBD and DCD donors and also, between EVLP and non-EVLP lungs and these differences will lead to better understanding of the injury-mediated mechanisms and to discovery of the very specific therapeutic targets. Methods We have analyzed microarray data from human DBD and DCD donor lungs collected at the end of cold ischemic time. Differential expression (DE) analysis was performed in R. For pathway and network analysis we employed IPA and STRING. The validation was performed with multiple logistic regression and 10-fold cross validation Results Lungs from DBD donors have increased activation of inflammatory pathways. In contrast, cell death, apoptosis and necrosis were activated in lungs from DCD donors. A novel panel of genes highly differentiate the EVLP from non-EVLP group. Conclusion These results laid the base for further investigations on therapies to repair the donor-dependent lung injuries, that in turn will increase the number of lungs available for transplantation.
Short Abstract: Background: Although the ex vivo lung perfusion (EVLP) technique is currently used to assess the function of marginal donor lungs, it also has the potential to be used as a platform for lung repair. We hypothesized that commonly enriched pathways shared between transplantation and EVLP may reveal common mechanisms of injury and represent potential therapeutic targets for lung repair prior to transplantation. Methods: Gene expression was measured from human lung biopsies, with 46 pre/post-transplant pairs and 49 pre/post-EVLP pairs. Gene set enrichment analysis identified gene set clusters enriched by transplantation and EVLP. Gene sets were clustered together. Gene set clusters were then classified as being predominant to transplant, predominant to EVLP or common. Results: Commonly enriched gene set clusters were associated with activation of innate inflammation, cell death, heat stress and downregulation of metabolism. Notably, the TLR/MYD88 signaling gene set cluster had the greatest number of nodes and was well connected with other inflammatory clusters. These mechanisms have been previously speculated as major mechanisms of acute lung injury in animal models. Conclusion: EVLP and transplantation both enrich for pathways associated with innate inflammation and cell death. These common pathways may represent therapeutic targets which can be utilized for lung repair.
Short Abstract: Cancer cells interact with their microenvironment during tumor progression changing their phenotypic states. This challenges the field of precision medicine which is currently not optimized for the individual patient. We now have the ability to obtain highly resolved molecular phenotypes from individual cells from patient samples that can be used to define cell states and study cellular responses to drugs. We present 2 Network-based computational frameworks referred to as STAMP and DRUGNEM with the potential to precisely determine the dynamic state of a disease and individualize combination therapy respectively for a given patient with applications in lung cancer and leukemia. STAMP combines mass cytometry time-series data with machine learning to predict the states of tumor cells from 5 lung cancer patients using a reference Epithelial Mesenchymal Transition (EMT) map trained with a Neural Network. DRUGNEM is used to individualize therapy for 30 ALL patients. Instead of trying to identify a mutation in the DNA and then try to find a drug that addresses that mutation, DRUGNEM isolates single cells from the patient. Then test those cells against a set of drugs to see which drug combinations are effective against the tumor by optimizing early intracellular responses using nested effects models.
Short Abstract: Immunotherapy is one of the promising therapeutic approaches for cancer. Although the efficacy of these drugs has been demonstrated, however, treatment response is only observed in a subset of patients. Therefore, it is important to stratify the patients that can potentially respond to these therapies with effective biomarkers. In this study, 18 DNA repair genes were selected based on their mutation frequencies in common cancers. Subsequently, with the somatic mutation information from The Cancer Genome Atlas (TCGA) MC3 project, the association between mutational landscape of these 18 DNA repair genes and immunotherapy treatment responses was investigated across different cancers. Since tumor mutational burden (TMB) and microsatellite instability (MSI) status are two of potential biomarkers of immunotherapy, the mutational landscape of these DNA repair genes and their association with TMB and MSI status was first examined, respectively. The results demonstrated that in most cancers, the mutation of these genes was significantly associated with high TMB and/or MSI-high status. Further, with treatment response information from independent datasets, survival analysis was performed to validate the association between the mutational landscape of these selected DNA repair genes and treatment response, indicating that these genes might serve as biomarkers for cancer immunotherapy in different cancers.
Short Abstract: Causative agent for Acquired Immune Deficiency Syndrome (AIDS) is HIV-1 virus, infects vital cells of the human immune system. Owing to a Lack of effective medicine against Human immunodeficiency virus (HIV) cost millions of life. Transcription, an important part of HIV-1 life cycle is primarily involves in the maintenance of viral latency. Generally, both viral & cellular transcription factors including transcription activators, suppressor proteins and epigenetic factors are involved in transcription process in host cell genome. Among, the virus-encoded transcriptional activator Tat is the chief contributor of transcription, which activates transcriptional elongation and initiation by interacting with the cellular positive transcriptional factors. The viral latency executes during the low expression of the autoregulatory viral trans-activating factor Tat from the sub threshold levels. Currently, the execution of optimal combined antiretroviral therapy (cART) to suppress HIV-1 replication has directed to a dramatic improvement in people, though, viraemic rebound after cART cessation and continues in latent reservoirs as an integrated and replication-competent provirus in each individual. Herein, we create a platform to understanding the biological mechanism of viral reservoirs of HIV-1 and to combat latency using the computational and experimental strategies to diminish the viral infection.
Short Abstract: Pancreatic ductal adenocarcinoma (PDAC) is the seventh leading cause of cancer mortality in world. Little is known about predictive markers for long-term survival. Our study is the first to perform extensive integrative individual and group based transcriptome profiling in PDAC patients with long term (LT) and short term (ST) survival. Using a discovery cohort of 19 PDAC patients from CHU-Liège (Belgium), we first identified differentially expressed genes (DEGs) between LT/ST. Second, we performed unsupervised system biology approaches to obtain meaningful gene modules. In particular, important modules obtained via weighted gene co-expression network analysis (WGCNA) showed significant correlation with clinical features, including overall survival, tumor size, and tumor invasion. Next, we created individual-level perturbation profiles. Careful inspection of individual’s specific omics changes across LT survival individuals revealed detailed biological signatures associated to focal adhesion and ECM receptors evident to PDAC survival. Finally, we identified prioritized cancer genes by integrating group and individual-specific DEGs on a directed functional interaction network. This resulted in highlighting HLA-DQA1, TAC1 and KCNH7 associated to conservative stretches of miRNA targets and showing role in PDAC survival. We believe that this study demonstrates an effective approach to elucidate the functional repercussions of integrative group and individual-specific transcriptome profiling.
Short Abstract: Chemotherapeutic response of cancer cells to a given compound is one of the most fundamental information one requires to design anti-cancer drugs. Recently, considerable amount of drug-induced gene expression data has become publicly available, in addition to cytotoxicity databases. These large sets of data provided an opportunity to apply machine learning methods to predict drug activity. However, due to the complexity of cancer drug mechanisms, none of the existing methods is perfect. In this paper, we propose a novel ensemble learning method to predict drug response. In addition, we attempt to use the drug screen data together with two novel signatures produced from the drug-induced gene expression profiles of cancer cell lines. Finally, we evaluate predictions by \emph{in vitro} experiments in addition to the tests on data sets.
Short Abstract: Ageing is the major risk factor for many diseases. With the rise in life expectancy, overall burden of ageing-related diseases increases. The molecular link between ageing and age-related diseases, however, has not been explored in a systematic manner. In this study, we test whether diseases with similar age-of-onset share a genetic component that is also implicated in ageing. We perform GWAS on UK Biobank data, which includes genomic, medical and lifestyle measures for almost 500k participants. Our preliminary analysis comparing more than 100 diseases based on their age of onset profiles suggest late-life diseases do share a genetic component that is not prevalent in other diseases. Moreover, these results cannot be explained only by disease categories (e.g. cardiovascular, endocrine) or comorbidities. In order to explore the link between ageing and these diseases, we are now combining our results with publicly available datasets for ageing such as age-series gene expression profiles and lifespan assays using model organisms. Identifying a shared ageing-related mechanism among multiple diseases offer an opportunity to target or even prevent multiple pathologies with a limited number of drugs and decrease the effect of polypharmacy on elderly while retaining the benefits.
Short Abstract: Through new publishing and funding guidelines the amount of publicly available data is rapidly increasing. This wealth of data has made it common practise that existing datasets are reanalysed to form hypothesis or validate own results. Often, datasets that address a matching research question use different ‘omics approaches, which complicates a direct analysis. Here we present a new analysis service of the Reactome pathway database that supports multi-omics, quantitative, comparative pathway analysis (http://www.reactome.org). The ReactomeGSA service provides an API that supports the simultaneous analysis of multiple datasets from heterogeneous ‘omics approaches. Currently, proteomics, RNA-sequencing and microarray experiments are supported as input data. The service offers gene set analyses as well as gene set variation analyses where gene/protein-level expression values are are mapped to the pathway level. The results are visualized using Reactome’s pathway browser application. There, the results of every experiment are shown in sequential order making it easy to quickly spot similarities and differences between the analysed datasets. We additionally provide the ReactomeGSA R package that acts as an interface to this public API. In summary, our new ReactomeGSA service supports the easy comparative analysis of multi-omics datasets on the pathway level.
Short Abstract: Developing drugs with anticancer activity and low toxic side-effects at low costs is a challenging issue for cancer chemotherapy. In this work, we propose to use molecular pathways as the therapeutic targets, and develop a novel computational approach for drug repositioning for cancer treatment. We analyzed chemically-induced gene expression data of 1112 drugs on 66 human cell lines and searched for drugs that inactivate pathways involved in the growth of cancer cells (cell cycle) and activate pathways that contribute to the death of cancer cells (e.g., apoptosis and p53 signaling), where we computed the pathway-based anticancer drug likeness score for each drug based on the enrichment of the selected pathways. The proposed pathway-based method outperformed the previous individual genes-based methods. Finally, we performed a large-scale prediction of potential anticancer effects for all the drugs and experimentally validated the prediction results via three in vitro cellular assays that evaluate cell viability, cytotoxicity, and apoptosis induction. Using this strategy, we successfully identified several potential anticancer drugs with anticancer effects specific to cancer cells and no toxic side-effects on normal cells. The proposed pathway-based method has great potential to improve drug repositioning for cancer treatment.
Short Abstract: The majority of tumour samples studied by high-throughput transcriptomics are heterogeneous at three levels: (1) bulk samples contain a mixture of several cell types; (2) cancers naturally develop inter and intra-tumour heterogeneity of malignant cells; (3) the evolving technology may introduce technical biases and limit comparison of new patient data to large publicly available datasets. Here we propose to use a data-driven deconvolution method – consensus independent component analysis (ICA) to decompose heterogeneous transcriptomics data and extract features suitable for patient diagnostics and prognostics. The method separates biologically-relevant transcriptional signals from technical effects and provides information about cellular composition and biological processes. The method was applied to RNA-seq and microarray data originated from several studies on patients with glioblastoma and low grade gliomas. We also validated the approach on in-house data from GBM patient tumour specimens and corresponding patient-derived cell lines and xenograft models in vivo. The proposed method efficiently cleans the heterogeneous datasets from technical biases and allows making diagnostic and prognostic conclusions about the new patients. In addition, it provides information about biological processes activated in the new patient tumours and can be applied for stratification of tumour and stroma-specific signals.
Short Abstract: Hypoxia, the deprivation of oxygen in a tissue, is a key hallmark of the tumor microenvironment that can drive angiogenesis and radiotherapy resistance, and be predictive of tumor recurrence and survival. Thus, having an accurate assessment of hypoxia within a tumor can inform patient treatments. Unlike previous approaches, we present a tissue-specific scoring method based on linear discriminant analysis which is trained on data from The Cancer Genome Atlas (TCGA) and Genotype Tissue Expression project (GTEx) and can be applied to new samples. We validated our score against external datasets to show that the score is predictive of survival differences and is indicative of genomic markers well-known to be correlated with hypoxia.
Short Abstract: Many cancer treatments are associated with serious side effects, while they often only benefit a subset of the patients. Therefore, there is an urgent clinical need for tools that can aid in selecting the right treatment at diagnosis. Here we introduce simulated treatment learning (STL), which enables prediction of a patient’s treatment benefit. STL uses the idea that patients who received different treatments, but have similar genetic tumor profiles, can be used to model their response to the alternative treatment. We apply STL to two multiple myeloma gene expression datasets, containing different treatments (bortezomib and lenalidomide). We find that STL can predict treatment benefit in both; a hazard ratio (HR) between treatments of 0.5 is observed for bortezomib for 19.8% and an HR of 0.36 for lenalidomide for 31.1% of the patients. We further test STL on a breast cancer dataset with estrogen receptor-positive patients who either received tamoxifen or not. Here we find an HR of 0.68 in class benefit (66.8% of patients) and an HR of 1.35 in class ‘no benefit’, indicating these patients experienced harm from receiving tamoxifen. This demonstrates that STL can derive clinically actionable gene expression signatures that enable a more personalized approach to treatment.
Short Abstract: Background: Lung cancer is the leading cause of cancer related deaths. It is important that patients are diagnosed and treated in early tumour stages before development of distant metastasis. We investigated the early spread and genomic evolution of cancer cells by analyzing CNV profiles of primary tumours (PTs) and matched disseminated cancer cells (DCCs) from lymph nodes. Methods: We collected 65 samples from PTs and lymph node DCCs, for which we performed array CGH. CNV profiles were pre-processed and aligned to each other to best reflect common genomic alterations and identify shared aberration events. The dynamics of metastatic progression were characterized by risk modelling, linear classification and statistical testing previously developed by us for melanoma spread (PMID:29426936). Results: As compared to our preceding mCGH-based analysis (PMID:29426936) substantially more effort is required to pre-process and align aCGH data from different samples (PT, DCC) and patients. We developed improved methods for centralization, discretization and alignment of CNV profiles and devised an alternative to the Ziggurat event decomposition (PMID:21527027). In addition, results on dissemination and colonization dynamics will be presented. Conclusion: Improved methods for cross- and individual patient analysis of CNV profiles enable deeper understanding of early cancer spread.
Short Abstract: Rare disease gene prioritization approaches rely on high quality curated resources containing disease, gene and phenotype annotations. However, effectiveness of such approaches is constrained by the limited recall and high curation cost of annotated data. We develop a tool PRIORI-T for rare disease gene prioritization that takes an input set of phenotypes describing a clinical case. PRIORI-T makes use of rare disease correlation pairs extracted from MEDLINE involving human rare diseases, phenotypes and genes. Further, the correlation pairs are augmented using novel associations inferred using the information propagation algorithm GCAS (Graph Convolution-based Association Scoring) and an association network is constructed. The gene prioritization performance of PRIORI-T was validated using the phenotype descriptions of 230 real-world rare disease clinical cases collated from recent publications. PRIORI-T achieved an overall AUC score of 97% on the Orphanet disease gene associations curated from literature. For the clinical cases, the causal genes were captured within Top-50 and Top-300 for more than 40% and 72% of the cases respectively. PRIORI-T outperformed other competing approaches for gene prioritization that rely primarily on curated resources. Combining PRIORI-T with variant prioritization tools could further improve the accuracy of identifying causal genes.
Short Abstract: Background: Mouse models indicate that cancer cell dissemination occurs extremely early, however, the timing in humans is unknown. We thus determined the time point of metastatic seeding relative to tumour thickness and genomic alterations in melanoma. Methods: We used mathematical risk modelling, linear classification and statistical testing to delineate the cancer cell dissemination and lymph node colonization dynamics and determine the likely origin of genomic alterations, i.e. whether aberrations were initially acquired within the primary tumour or later in the draining (sentinel) lymph node. Results: We find that lymphatic dissemination occurs early at a median primary tumour thickness of ~0.5mm. Typical driver changes, including BRAF mutation and gained or lost regions comprising genes like MET or CDKNA2, are acquired within the lymph node at the time of colony formation. These changes define a colonisation signature that was linked to xenograft formation in immunodeficient mice and death from melanoma. Conclusion: Melanoma cells leave primary tumours early and evolve at different sites in parallel. We propose a model of metastatic melanoma dormancy, evolution and colonisation that will inform direct monitoring of adjuvant therapy targets. (1) Nat Commun. 2018 Feb 9;9(1):595, doi: 10.1038/s41467-017-02674-y, PMID: 29426936
Short Abstract: Current clinical practice to diagnose acute infections is suboptimal. Initial diagnosis is guided by non-specific physiological symptoms (e.g. presence of fever). Confirmatory tests of the presence or type of infection (detection of pathogens) can be time-consuming and inconclusive. Recent work identified 29 mRNA biomarkers from patient blood whose expression indicate infection status and disease severity. We identified subsets (modules) of the biomarkers showing common expression patterns in different aspects of the infection. Leveraging this structure, we develop a novel, diagnostic neural-network model called ‘SpokeNet’. In this model, we learn subnetworks (‘spokes’) that transform and summarize the expression of genes of a given module, combining outputs of each ‘spoke’ to predict infection status. To train and evaluate the model, we compiled and co-normalized a set of 18 different transcriptomic studies of patients (N=1,092) with physician-adjudicated labels of bacterial infection, viral infection and non-infectious inflammation. We performed a comprehensive hyperparameter search to identify promising network architectures and demonstrate the model’s classification performance in comparison to fully connected networks, networks trained on deterministic transforms of the expression modules, and non-neural-network classifiers. More broadly, the model provides a general framework for classification from differentially expressed biomarkers that may successfully account for patient heterogeneity.
Short Abstract: The noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP) is a new entity in thyroid tumors classification, introduced in 2017. Papillary thyroid cancer (PTC) is the most common thyroid cancer. The aim of our study was to detect somatic mutations in NIFTP and PTC. In total, 25 tumors (7 NIFTPs and 18 PTCs) and 12 normal thyroid samples, obtained from 12 patients, were subjected to whole exome sequencing. Somatic single nucleotide variants, small insertions, small deletions and copy number variants were called. Our analysis detected HRAS and BRAF mutations in PTC cases. BRAF mutation was found in one NIFTP case, whereas NRAS mutations were observed in two NIFTP cases. These results prompted us to reevaluate all histopathological NIFTP samples. After in-depth analysis of paraffin block, four NIFTP cases were reclassified. Finally, only 3 cases were identified as NIFTP (updated criteria 2018). Our study showed that evaluation of BRAF mutation may support histopathological diagnosis of NIFTP, particularly when it is not possible to exam the entire tumor. The analysis did not show significant differences between PTC and NIFTP in exome sequencing data. The study was supported by the National Center of Research and Development MILESTONE project STRATEGMED 2/267398/4/NCBR/2015.
Short Abstract: Molecular profiling assays are becoming widely available and provide valuable information on tumor characteristics, which can identify targeted therapies or immunotherapies for cancer patients. However, the clinical utility of such tests remains unclear. At the UniversityHospital Zurich, the clinical utility and subsequent treatment alterations of the FoundationOne Comprehensive Genome Profiling Test (FOne) was analysed. A retrospective study (2017 - 2018) with 71 patients with solid tumors, of which the majority presented a progressive disease state (80%), was conducted. Therapies proposed by FOne were studied and examined for alterations in the therapeutic options under standard diagnostic care. 4 patients (6%) received a new therapy based on the FOne result. For 11 cases (15%), a new therapy option was identified by FOne, which due to the current treatment plan is considered for later use. 3 cases (4%) were evaluated for potential clinical trial enrollment. For an additional 6 patients (8%), the therapies proposed by FOne were already established on the basis of previous testing (e.g. smaller genomic panels, IHC, FISH). Overall, 18 (25%) patients received a new therapy option by FOne after standard of care diagnostics. Therapeutic alterations were observed particularly in patients with a rare or unknown tumor type.
Short Abstract: Introduction Anti-angiogenic drugs, as bevacizumab, have been approved for treatment of patients with metastatic colorectal cancer (mCRC) since 2004. The lack of qualified response biomarkers makes the use of these drugs challenging, resulting in sub-optimal treatments. Methods Plasma samples were collected during the first three weeks of treatment with cytotoxic chemotherapy and bevacizumab. Principal component analysis on 19 angiogenesis-related circulating proteins was used to shortlist proteins that occurred at altered levels following treatment. The trajectories were analysed using a Bayesian modelling approach and the outcomes joint modelled with patient survival data. Results Data from the 70 patients enrolled in this study were analysed. Placental growth factor (PlGF) is a member of the vascular endothelial growth factor (VEGF) family. The slope defining the trajectory of PlGF was associated significantly with progression free survival (PFS) (HR = 0.015 [0.0004, 0.5389], p-value = 0.0216), while pre-treatment levels of PlGF were not. Conclusion Early change in circulating levels of PlGF may represent a candidate response biomarker for patients with mCRC receiving chemotherapy and bevacizumab. Patients with a rapid decline in circulating levels of PlGF within the first three weeks of treatment have reduced PFS. This warrants further investigation in order to improve patient outcomes
Short Abstract: In a clinical setting, small molecular signatures capable of identifying disease or a certain phenotype are of great practical use for cost reduction. We are interested in the use of DNA methylation as markers due to its high specificity for specific cell types. We therefore explore feature selection methods to obtain small DNAm signatures (< 10 sites) to perform cell type classification and computational deconvolution. Estimates of cellular content are then used in disease cohorts for cancer or fibrosis for patient stratification. In this work we use heuristics in combination with different machine learning algorithms to perform classification and computational tissue deconvolution. To achieve this we have compiled and generated a dataset of DNA methylation array data from different purified cell lines (14 different cell types spanning over 500 samples and with a total of over 400.000 features). We use these data for feature selection and to train classifiers for 8 cell types with enough samples for training. These classifiers can predict cell types in an independent dataset with 90-99 accuracy. Next, we use these cell specific classifiers to perform computational tissue deconvolution on distinct clinical samples and show that, for example, stromal cells can separate well patients with cirrhosis.
Short Abstract: The practice of cancer precision medicine requires fine-grained knowledge of biological processes to explain patient-specific clinical phenotypes. However, due to complexity of contradictory mechanisms’ integration, many knowledge bases (KBs) do not include uncommon mechanisms that are specific for individual cases. Moreover, state-of-the-art knowledge representation (KR) methods still lack a capability to enable uncovering of implicit relationships without applying more complex reasoning techniques. To address these challenges, we introduce a novel informatics framework for reconciliation of disease pathways and their representation for cancer precision medicine analytics. We developed a consensus method to reconcile concordant as well as contradictory disease pathways (DPs) and to include them in a KB. We also introduced a multi-perspective representation method which allowed us to cross-link DPs through their context information (e.g., genomic alterations, drug action, etc.) and thus uncover latent relationships such as a common mechanism of V600E mutation in BRAF gene in Melanoma and Hairy Cell Leukemia DPs in an explicit way. Our method enabled formalization and inclusion into a KB a large spectrum of complex and heterogeneous biological mechanisms. We, therefore, conclude that the proposed framework can improve knowledge support for precision medicine practitioners by efficiently bringing the latest biomedical discoveries into clinical setting.
Short Abstract: Personalized medicine approaches for cancer therapy seek to determine optimal therapies for cancer patients based on the molecular profile of their individual tumour. The motivation is to target oncogenomic alterations in tumours with the appropriate therapies. However, it is currently infeasible to determine the optimal therapy simply given the genomic profile of a tumour. There has been significant recent work in attempting to use the computational approach of machine learning for predicting tumour drug response. Machine learning methods have been successfully used for drug response prediction in cancer cell lines and even have been extended to predicting individual cancer patient response to a small number of chemotherapies. This work uses support vector machines (SVM) to predict the response to chemotherapies of 570 advanced cancer patients from the BC Cancer Personalized OncoGenomics program using the transcriptomic profile of their tumours. The dataset presents a unique challenge as there are over 20 cancer types and over 50 unique chemotherapies used for these highly advanced cancers. F-measures for the SVMs were found to be as high as 0.83 for some cohorts. This work demonstrates the clinical value of large-scale sequencing projects, and the potential of machine learning in prescribing cancer drugs.
Short Abstract: We present a new approach to multi-omics data integration developed as a part of the PersonALL project, which is a collaborative project among the Polish Pediatric Leukemia/Lymphoma Study Group centers. PersonALL is focused on molecular mechanisms of acute lymphoblastic leukemia aiming into improved diagnostics, classification and treatment personalization. Our method allows integrating multi-omics data in two steps. First step consists of a feature selection part, which can be applied for any type of the following omics experimental data: gene expression, methylation or CNV. For each type of the omics data different approach is designed taking into account specific characteristics of a particular data type. The second step uses machine learning approach to combine extracted features and perform tree-based survival analysis in order to obtain molecular markers for better patient stratification. Presented approach is available as a web application and is a part of the PersonALL computer system developed within the project dedicated to store patient clinical data and information about the treatment course. However, the system works independently allowing the users to submit and analyze their own datasets. The proof of concept example is presented by providing the results of analysis the data of 109 leukemia patients from TCGA database.
Short Abstract: Ebola virus (EBOV) is an emerging severe viral pathogen with up to 90% human mortality rate. Infection outcome can be improved with early diagnosis and treatment; however, quick and accurate diagnostic tests for EBOV infection are lacking. Transcriptomic data suggests that tracking the host response to viral infection can identify EBOV infection before current gold-standard assays and that the host response to EBOV is different from the response to other diseases. In this study, we analyzed longitudinal gene expression data obtained using a 700-gene NanoString probeset from non-human primates to identify changes in host gene expression after infection with one of three different viruses: EBOV, Venezuelan equine encephalitis virus (VEEV), and Zika virus (ZIKV). We compared gene expression changes pre- and post-infection and for each virus and identified a small number of differentially-expressed mRNAs that were upregulated during symptomatic stages, indicating a viral infection signal. Analysis is ongoing to identify differentially expressed mRNAs and enriched signaling pathways that are specific to EBOV, VEEV, or ZIKV infection. Our results show the potential of host gene expression analysis for the development of EBOV diagnostic clinical assays.
Short Abstract: Inflammatory breast cancer (IBC) is an understudied, aggressive, and rare form of breast cancer. We present results from a phase II clinical trial (NCT01796197) that examined the effect of two monoclonal antibodies targeting HER2, trastuzumab and pertuzumab (jointly, HP). Twenty-three HER2+ IBC patients were enrolled from Aug 2013 - June 2017 with a 43% (10/23) response rate. There are two analytical objectives of this study. The first is whether multidimensional machine learning modeling provides a more informed framework than a standard low-dimensional RNA-Seq analysis that compares single genes between biologically relevant groups. The second objective is to determine whether an RNA-Seq based gene signature from the Day 8 (D8) biopsy, collected after treatment, is a better predictor than the Day 1 (D1) baseline. A Random Forest model was trained to predict treatment response from D1 or D8 mRNA expression and evaluated using leave-pair-out cross-validation. Across all metrics (Accuracy, MCC, AUC, Precision, Recall, F-score), the D8 model significantly separates responders from non-responders than the D1 model (p-value: 1.0X10-15), showing improvement with each additional gene. These findings may have implications for the assessment of immunotherapy for patients with HER2-positive IBC, but require further validation in a larger IBC cohort.
Short Abstract: Identifying robust biomarkers of drug response constitutes one of the key challenges in precision medicine. Patient-derived tumor xenografts (PDXs) have emerged as reliable preclinical models since they better recapitulate tumor response to chemo- and targeted therapies. However, the lack of computational tools makes it difficult to analyze PDXs' high-throughput molecular and pharmacological profiles. We have developed Xeva (XEnograft Visualization & Analysis), an open-source software package for in vivo pharmacogenomic datasets that allows for the quantification of the variability in gene expression and pathway activity across passages. By using our package, we showed that gene and pathway activity is consistent across different PDX passages. Using the largest PDX pharmacogenomic dataset to date, we identified 87 pathways that are significantly associated with response to 51 drugs (FDR<0.05). We found novel biomarkers based on gene expressions, copy number aberrations and mutations predictive of drug response (concordance index>0.60; FDR<0.05). Our study demonstrates that Xeva provides a flexible platform for integrative analysis of preclinical in vivo pharmacogenomics data to identify biomarkers predictive of drug response, a major step toward precision oncology.
Short Abstract: Diagnostic errors are common, and it has been estimated that everyone will experience at least one diagnostic error in their lifetime. Common and potentially harmful diagnostic errors include both under-, mis- and over-diagnoses. We use Chronic Obstructive Pulmonary Disease (COPD) to showcase a new approach to identifying faulty diagnoses. Underdiagnosis of COPD has been estimated to be 50-80%, while 5-60% are misdiagnosed. Earlier studies evaluate the airway obstruction in population-wide data and identify undiagnosed patients as people with airway obstruction but no COPD diagnosis, while misdiagnosed patients have a COPD diagnosis, but no airway obstruction. Here we use an alternative disease trajectory approach. We used the Danish National Patient Registry containing hospital diagnoses for the entire Danish population (6.9 million people) covering more than two decades. COPD patients were used to identify common, temporal diagnostic correlations that were combined to longer disease trajectories. These typical disease trajectories were compared with non-COPD patients to discover undiagnosed COPD. Low similarity between COPD patients and typical disease trajectories identified mis- and over-diagnosed patients. The method is entirely general and not limited to COPD and could improve the diagnostics processes to reduce errors, avoid harm, and find optimal diagnoses faster in population-wide health data.
Short Abstract: In 2012, an international consensus paper identified four molecular subgroups of medulloblastoma (WNT, SHH, Group 3 and 4), each associated with distinct clinico-molecular features. Subsequently, three independent reports defined additional intra-subgroup heterogeneity. However, owing to differences in cohorts and methodologies, estimates of subtype number and definition were inconsistent, especially within Groups 3/4. We aimed to reconcile the definition of Group 3/4 medulloblastoma subtypes through the analysis of 1501 medulloblastomas with DNA methylation profiling, including 852 with matched transcriptome data. Using multiple complementary bioinformatic approaches, we compared the concordance of subtype calls between published cohorts and analytical methods, including assessments of class definition confidence and reproducibility. While still identifying the original consensus Group3/4, our analysis most strongly supported a refined definition comprising eight subtypes (I-VIII). All subtypes were supported by 2/3 class-definition methods. There were significant, subtype-specific enrichment of driver-gene alterations and cytogenetic events, and subtypes had disparate survival outcomes. Collectively, this study provides continued support for consensus Groups 3/4, whilst robustly defining their extensive intertumoral heterogeneity. Outputs from this study will help shape definition of the next generation of medulloblastoma clinical protocols and facilitate application of enhanced molecularly-guided risk-stratification to improve outcomes for patients.
Short Abstract: Single-cell RNA-seq (scRNA-seq) is an established tool to measure gene-expression levels of heterogeneous cellular systems, and it has recently been applied to tissue samples from patients affected by autoimmune diseases. Furthermore, a number of novel late factor models that explicitly account for the sparsity and high dimensionality of scRNA-seq data have been proposed for unsupervised discovery of molecular signatures. Here, we aim to benchmark four state-of-the-art latent factor models (Coordinated Gene Activity in Pattern Sets, Hierarchical Poisson Factorization, single cell Variational Autoencoder and Latent Dirichlet Allocation) applied to two scRNA-seq dataset from lupus nephritis (LN) and rheumatoid arthritis (RA) patients, from the NIH-AMP consortium. By evaluating the robustness, stability and magnitude of biological signal retrieved by these algorithms, we select Hierarchical Poisson Factorization as the best performing method. Furthermore, we show how this approach is able to retrieve known and novel gene signatures associated with the disease state, such as a B cell- and T cell-specific type I Interferon response in LN, and a chondroitin sulfate metabolism signature for a specific sub-population of fibroblasts in RA. Our results highlight the power of these methods to identify, in a completely unbiased fashion, molecular signatures from scRNA-seq data.
Short Abstract: Dementia is defined by several symptoms including deficit of memory and cognitive functions, affecting the daily life of more than 47 million people worldwide. There are no currently available curative treatments and there is a lack of scientific consensus for both direct and inverse comorbidities in patients affected by dementia. One of the reason is due to the lack of an optimal stratification in meaningful subgroups which could improve the classification of dementia and consequently provide better management of patients, especially in the early treatment of their comorbidities. We propose an approach to redefine dementia patients by reconstructing their shared comorbidities using laboratory tests, prescribed drugs, clinical notes and their disease history. We applied the PhenoGraph method, which converts the data to a graph that represents the phenotypic similarities between patients, calculates the Jaccard coefficient between nearest-neighbor sets, builds an undirected graph from the weighted links and it identifies communities using the Louvain method on the graph. Each cluster defines a cohort of patients which is the input of the algorithm to discover comorbidities. We were able to identify novel subgroups characterised by specific clinical features. This will help us to better stratify dementia patients and improve the decision-making process.
Short Abstract: Diagnosis and treatment of cancer patients depend normally on the histological inspection of tumor biopsy samples. However, such diagnostic strategy encounters important challenges related to tumor heterogeneity and failures to distinguish between clinically relevant subtypes of cancer. Fortunately, recent technological advances in NGS technologies and biomedical imaging modalities enable the generation of high-throughput Omics data of imaging traits and genetic profiles that can be further utilized for better characterizing cancer at the molecular level, allowing more accurate diagnosis than the histopathological approaches for targeted treatments. To this end, we present a pilot radiogenomic study that comprises joint analysis of clinical parameters, imaging features, and expression profiles of coding and non-coding RNAs of prostate cancer patients. Four distinguishing biomarkers were identified between two frequently occurring tumor stages and were highly correlated with aggressiveness-related imaging features. This preliminary result could hint at devising a radiogenomic association-map that correlates tumor radiographic features to the underlying genetic makeup. Furthermore, the combined feature set of both annotated clinical parameters and detected molecular features markedly improved the prediction accuracy for the corresponding pathological stage.
Short Abstract: Dosage guidelines are often broadly defined and individual dosage regimens strongly rely on physician's experience based on patient's weight, height and conditions that may influence drug's absorption and excretion. However, complex multimorbid scenarios are difficult to study before any drug is introduced in the market. Thus, modifications on the drug characteristics referring to new adverse drug reactions and dosage adjustments often occur after. Individualization of dosage regimens to a particular patient or group of patients is critical for optimal therapy, maximizing benefits and minimizing unwanted side effects. Previous studies have centered their attention in the polymedicated elderly population and have studied individual drug regimens, but no studies have been done before aiming at characterizing dosage patterns in the whole population with a polypharmacological aspect. In this study, we present the dosage polypharmacology map using data fron an electronic health medication registry covering two main regions of Denmark from January 1st, 2006, to June 30th, 2016. The resulting dosage map has potential to be used for predicting how drugs influence each other, suggesting patient's polypharmacy profile to be of significant consideration for improving dosage regimens.
Short Abstract: We integrated genomic and transcriptomic profiling with ex vivo drug responses to identify and prospectively translate targeted therapeutics in AML. Comprehensive response profiles of 515 emerging and clinical cancer drugs from 252 consecutive patient samples revealed 14% of effective and approved drugs in AML. We identified 142 statistically significant association between drug responses and mutations including increased sensitivity to JAK inhibitors in the patients with NPM1 and IDH1/2 double mutation. The molecular features were tested to predict drug responses and suggested that gene expression has better prediction power as compared to mutations and fusions. The molecular subset specific analyses revealed transcriptomic features significantly associated with drug responses e.g. HOX gene overexpression and sensitivity to JAK inhibitors (in NPM1 mutant AML) and AHR overexpression and MEK sensitivity (in RAS wt AML). In a prospective clinical study, we implemented the utility of molecular denominators to tailor targeted treatments for late-stage chemorefractory AML patients. In 39% of 26 treatment occasions, complete clinical remission or leukemia-free status was achieved. Our proof of concept, real-time precision systems medicine study provides a paradigm for rapid tailoring of therapies for cancer patients in an era of exploding molecular information and emerging oncology drugs.
Short Abstract: Polygenic scores for lab results (liver function tests, cholesterol, and complete blood counts), derived from the UK Biobank and from Biobank Japan, replicate in our Electronic Health Record based Michigan Genomics Initiative. Findings include: polygenic scores for some of these lab-result-traits replicate trans-ethnically, while others do not; polygenic scores generally replicate even in individuals with lab values perturbed by disease states, as identified through the electronic health record; and, extreme values of polygenic score are generally associated with a greater-than-expected relative risk of extreme lab test values. Trans-ethnic replicability or lack-thereof must be driven by biological features of the different traits in different populations or at specific loci, since the study samples are essentially the same across traits. Replication of polygenic score effect sizes in disease-states supports that heterogeneity across genetic principal components in the original biobanks (reported elsewhere) is likely to be a genetic-background and not environmental effect, if the environmental effect of disease is assumed to be a substantial example. And, finally, the "heavy tailed" effect on lab values is consistent with the extreme increase in risk seen in high quantiles of polygenic scores for binary traits (reported prominently for heart disease).
Short Abstract: Chronic obstructive pulmonary disease (COPD) is categorized into four stages by GOLD criteria (mild, moderate, severe, and very severe) using spirometry measurements. While this staging generally recapitulates disease severity, it does not always correlate with clinical manifestations or disease progression. The GOLD criteria further suffers from inconsistent spirometry measurements, which may vary significantly even within a single individual over a short period of time. Hence, there is growing interest in combining multiple objective disease-related variables to identify relevant subsets of patients associated with disease severity. Using unsupervised clustering method, we utilized features obtained from quantitative CT scans of patients from the COPDGene study (N=5273) and identified three novel imaging clusters of patients that are phenotypically distinct. The three clusters (preserved, interstitial predominant, and emphysema predominant) were replicated in the Detection of Early lung Cancer Among Military Personnel (DECAMP) cohort (N=360). Matched bronchial airway gene expression (total RNA-seq) for 146 individuals were used to further explore the molecular basis of the clusters. Using differential gene expression analysis, gene set enrichment analysis, and gene set variation analysis, we aimed to elucidate determinants of molecular processes that correlate with COPD severity and possible mechanisms that contribute to disease progression.
Short Abstract: Diabetic retinopathy (DR) is a cause of acquired blindness in diabetes mellitus (DM) patients. In this study, we performed a comprehensive antibody-based verification of 18 previously identified DR biomarker candidates. In the first stage, we verified the 18 biomarker candidates, which were selected in previous MRM data (AUC > 0.7 or P-value of t-test < 0.05), by Western blot and ELISA, for which we used plasma samples from patients with No diabetic retinopathy (NO DR) and Mild&Moderate nonproliferative diabetic retinopathy (MI&MO NPDR). In the second stage, we designed, constructed, and evaluated a systematic statistical pipeline for establishing a multimarker panel. In the final stage, we applied this pipeline to our statistical analysis for antibody-based candidate verification. In the statistical analysis, F-test and stepwise MANOVA were first performed to select multimarker proteins that contributed to the discriminatory power between NO DR and MO NPDR; then, we examined the error rates of several prediction models by leave-one-out crossvalidation (LOOCV), for which linear discriminant analysis (LDA), support vector machine (SVM), and logistic regression (LR) were used. Based on the statistical analysis, we proposed 3 panels of markers: single, 2-protein, and 4-protein.
Short Abstract: Bioinformatics workflows are integral to the automated processing and analysis of diverse data from multiple sources, accumulated throughout the research and clinical study phases of drug candidate investigation. Immunotherapy R&D is one example of complex research where bioinformatic approaches help to advance knowledge about novel therapies. The discovery of patient-specific high affinity neoepitopes, required to target treatment appropriately, is challenging, as several genomics attributes of the tumor and germline cells need to be measured and integrated to derive accurate predictions. In this poster, we present an automated neoepitope prediction pipeline operating within the enterprise software platform Genedata Profiler®. It allows to predict and select the most specific patient neoepitopes by somatic variant calling, estimation of mutated protein expression levels and patient-specific MHC genotypes calling from paired tumor-normal tissue samples. The customizable pipeline can run on most scalable cloud and HPC environments, and includes the regulatory features required for clinical trials, with the results submitted directly to regulatory authorities, if desired. We demonstrate the scientific validation of the pipeline by reproducing and confirming the results from a published medical case of a metastatic breast cancer patient successfully treated with tumor-infiltrating lymphocytes reactive against four patient-specific mutated proteins.
Short Abstract: The association of pyruvate kinase muscle type (PKM) with survival of cancer patients is controversial. Here, we focus on different transcripts of PKM and investigate the association between their mRNA expression and the clinical survival of the patients in 25 different cancers. We find that the transcript encoding PKM2, and three other functional transcripts are prognostic in multiple cancers. Our integrative analysis shows that the functions of these four transcripts are highly conservative in different cancers. Next, we validate the prognostic effect of these transcripts in an independent kidney renal clear-cell carcinoma (KIRC) cohort and identify a prognostic signature which could distinguish high- and low-risk KIRC patients. Finally, we reveal the functional role of alternatively spliced PKM transcripts in KIRC, and discover the protein products of different transcripts of PKM. Our analysis demonstrated that alternatively spliced transcripts of not only PKM but also other genes should be considered in cancer studies, since it may enable the discovery and targeting of the right protein product for development of the efficient treatment strategies.
Short Abstract: ccRCC is characterised by the biallelic inactivation of the Von Hippel-Lindau tumour suppressor gene. However, wtVHL ccRCCs form 5-12% of all ccRCC. These tumours are more aggressive with poorer prognosis and worse survival. We aim to identify the pathways by which wtVHL ccRCCs develop and features distinguishing them from -/-VHL ccRCC. Of the 242 samples within TCGA ccRCC dataset with WES, CNA and methylation data, 9 were found to be wtVHL ccRCC of which 1 belongs to new TCEB1 ccRCC subtype. 12 wtVHL ccRCC samples from the UZH biobank have also been identified, 4 belong to the TCEB1 subtype. To tackle the low samples sizes both an individualised multi-omics analysis is conducted on this subgroup of tumours (bottom-up) along with a pan-cancer analysis to determine the location of these tumours within the spectrum of tumour biology using Cancer Integration via Multikernel Learning (Top-down). We find subtle differences distinguishing wtVHL ccRCCs and potential prognostic markers via the analysis of single-omics datasets. Extracellular components, transport channels and their interaction with the extracellular matrix may play a vital role in the progression and aggressiveness of these tumours; these findings are further supported by the application of an integrative approach using multiple omic datasets.
Short Abstract: Triple Negative Breast Cancer (TNBC) is an aggressive subtype for breast cancer and currently standard biomarkers are absent for Treatment and Prognosis. This study presents a novel approach for biomarker discovery which predicts recurrence and pathologic Complete Response (pCR) of TNBC using differential gene expression measured by the Nanostring nCounter immunology panel. We extract nine and thirteen differential expression genes (DEG), which are characterized by pCR and RELAPSE from 579 immune gene panels using edgeR as feature selection. For the prediction of pCR and RELAPSE patients, Random Forest models built with the DEG generated. The prediction models performed robustly with moderate accuracy of 0.74% and 0.88% for pCR and relapse, respectively and randomization test (empirical p-value for pCR: <0.015, RELAPSE: <0.018). Six (CD1A, GNLY, CCL5, FCER1A, CCl20, SELE) and three gene signatures (IL17B, EDNRB, TGFBI) respectively result in as a predictor of pCR and relapse survival in TNBC.
Short Abstract: Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive malignancy and a leading cause of cancer mortality worldwide. Currently, surgical resection remains the only definitive treatment for PDAC. The long-term goal of this project is to find off-label drugs that can be offered as additional personalized treatments to PDAC patients and to potentially identify biomarkers that predict drug responses in patients. Over the last year, a biobank of over 30 pancreatic cancer patient-derived organoid-lines has been established at ETH Zürich. In a small pilot study, we characterized a set of these previously established pancreatic cancer organoid lines. We used high-throughput drug screening to identify drugs that prevent the growth of the organoids. Furthermore, WES, WGS, and RNA-seq was performed on a subset of the PDAC organoid lines and corresponding wild-type tissue to allow molecular profiling including somatic variant calling (SNVs, InDels, CNVs) and gene expression analysis. Our analysis shows that mutational profiles of the patient-derived organoids correspond to mutational profiles of sequenced patient cohorts. Moreover, in silico prediction of PDAC organoid drug response based on the organoids’ molecular profiles shows promising results.
Short Abstract: High throughput screening (HTS) is a method to screen a set of chemical compounds against patient derived cells or cell lines for a potential in vivo response. Automation and HTS techniques play crucial role in drug sensitivity and resistance testing (DSRT) platform to profile drug responses in certain group of patients. While commercial and academic data analysis platforms are available, challenges remain in integrated data analysis with detailed quality control (QC) checks for identifying systematic errors, drug response scoring and reporting. To fulfill this gap, we have built Breeze - an open access DSRT data analysis platform. Breeze accepts raw data in tabular format and generates in-depth QC metrics, dose response curve fits, quantitative drug response metrics such as IC50/EC50 and drug sensitivity scores. The reports are generated in the form of interactive, informative and intuitive visualizations such as detailed QC metrics with emphasis on performance of controls, plate heatmaps, row wise and column wise scatterplots, barplot with top hits, multi-screen heatmap with several clustering options, and circular plots. Breeze helps run multiple screens simultaneously and thus helps in identifying potential hits, patient stratification and systematic data analysis. Breeze is built with R and accessible at https://breeze.fimm.fi
Short Abstract: Although a large part of the drug risk assessment is investigated during clinical trials, it is often incomplete due to limited sample sizes, strict selection criteria or short duration, obtaining as much information as possible before introducing the drug on the market. At that time the full safety profile of the drug is still unknown. Moreover, an increased risk in the patient population is the intake and exposure to more than one compound and/or drug at the same time. Polypharmacy has been associated with adverse drug reactions (ADRs) and severe outcomes such as increased stay in hospital, higher readmission rates soon after discharge and mortality. In order to assess drug efficacy and adverse outcomes, a detailed knowledge on how it is used, the treatment combinations and the stratification of patients based on treatment is required. We analyzed hospital data for approximately 2.7 million patients in the Capital Region of Denmark and Region Zealand, of which more than 1.5 million contained treatment information. We performed a longitudinal analysis of the polypharmacy landscape and multimorbidity analysis in this cohort, aiming at illuminating combinations with increased relative risk of long-term toxic outcomes, and downstream drug-disease associations.
Short Abstract: The microenvironment of solid tumors provides insight in the pathogenesis. Here we used MACSima technology, an automated and highly multiplexed immunofluorescence imaging approach, to characterize the composition and location of intra-tumorigenic cells in a pancreatic tumor biopsy sample. The technology returns a multi-marker dataset composed of a stack of microscopy images of the same tissue section. We have developed an analysis pipeline using R with components typically applied to single-cell-omics approaches to characterize the cancer biopsy. Both, pixel and segmented single-cell data were used for antigen quantification and pattern recognition. The analyses included correlations, clustering, and dimensionality reduction by UMAP or tSNE. The components can easily be extended. Using this pipeline, putative novel tumor markers were identified by their correlation with known tumor markers. Yet, their anticorrelation provided information about infiltrating immune cells. They co-localized in defined regions within the tumor. Their expression profiles suggest pathways associated with the pathogenesis. The increased marker feature space of MACSima enables a wholistic understanding of distinct types and locations of immune and tumor cells within a tumor section. The automated image analysis pipeline allows for an improved and fast detection of cell types and disease phenotypes.
Short Abstract: Obesity is the second most frequent preventable risk factor for cancer, after smoking. However, the exact molecular mechanisms initiating obesity-related cancer remain unknown. We aimed at identifying molecular pathways potentially affected by obesity across cancer types through investigating mutational signatures. Analyzing TCGA datasets from 6 tissue types, we separated single nucleotide variants into ~40 mutational signatures, including 28 signatures of known etiology, signatures of technical origin and unknown etiology. Multistep linear regression models were used to assess BMI association with mutational signatures, gender, age, tumor infiltration and known driver genes. We identified signatures of mismatch repair (SBS15) and ageing (SBS1) to be associated with obesity in liver, while SBS1 hasn’t been associated with age in this tissue, concordant with previous reports. In colon, we identified a BMI association with SBS8, thought to be associated with deficiencies in homologous recombination or nucleotide excision repair, and SBS39 of unknown etiology. Thus, we have identified mutational signatures associated with BMI, which suggest molecular pathways affected by obesity in different tissues. Molecular understanding of the mechanisms triggering obesity-related cancers can serve to develop effective cancer prevention programs and cancer treatment to account for this emerging risk factor affecting increasingly large parts of the population.
Short Abstract: Precision cancer medicine approaches are typically focused on searching for ‘actionable’ mutations in these genes, aiming at their therapeutic targeting. However, identifying novel genetic interactions between cancer genes may open new drug treatment opportunities. We studied two fundamental types of genetic interactions: The well-known synthetic lethal interactions, describing the relationship between two genes whose combined inactivation is lethal to the cell; and the less-known synthetic rescues interactions, where a change in the activity of one gene is lethal to the cell but an alteration of its SR partner gene rescues cell viability. We shall describe a new approach for the data-driven identification of these genetic interactions by directly mining patients’ tumor data. Applying it to analyze the Cancer Genome Atlas (TCGA) data, we have identified the first pan-cancer genetic interaction networks shared across many types of cancer, which we then validated via existing and new experimental in vitro and in vivo screens. We find that: (a) synthetic lethal interactions offer an exciting venue for personalized selective anticancer treatments enabling the prediction of patients’ drug response and providing new selective drug target candidates, and (b) targeting synthetic rescue genes can mitigate resistance to primary cancer therapy, including both targeted and immunotherapy.
Short Abstract: ELIXIR-LU is the Luxembourgish Node of the European bioinformatics research infrastructure ELIXIR. ELIXIR-LU is focussed on translational medicine, aiming to increase the sustainability and re-usability of clinical, associated omics, imaging and mobile/sensor data. Our services centre around FAIRification of translational medicine data – data that are Findable, Accessible, Interoperable and Reusable. We present recent additions to our research data services as they support this mission, in adherence to ELIXIR's open access/free service model. FAIR data begin at collection – acquisition in the clinic must base on electronic data capture with controlled vocabulary following international standards such as CDISC to make them Interoperable. Accessibility is enabled through free hosting of translational medicine data in ELIXIR-LU if open to the research community. Re-usability is enhanced by support for harmonising and uploading the data, to integration platforms for clinical and molecular data also offering relevant analysis tools (e.g. tranSMART). A metadata catalogue for translational medicine data provides Findability to the data, further strengthened by the Beacon tool designed to search for genetic variants in the genomes stored within ELIXIR. All these services are complemented with GDPR compliance tools ensuring comprehensive data protection around these sensitive human datasets (poster by Pinar Alper et al).